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Abstract 

In Quasi-Monte Carlo integration, the integration error is believed to be 
generally smaller than in classical Monte Carlo with the same number of 
integration points. Using an appropriate definition of an ensemble of quasi- 
random point sets, we derive various results on the probability distribution of 
the integration error, which can be compared to the standard Central Limit 
Theorem for normal stochastic sampling. In many cases, a Gaussian error 
distribution is obtained. 
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1 Introduction 



It is widely held that Quasi-Monte Carlo integration, in which the integra- 
tion points are distributed more uniformly than in classical Monte Carlo 
integration which uses truly (or approximately) random points, can lead to 
potentially much smaller integration errors for the same amount of effort 
(i.e. the same number of integrand evaluations). A number of theorems are 
known that relate information on the fluctuating behaviour of the integrand 
(such as variation, modulus of continuity, etc.) and information on the de- 
gree of uniformity of the point set employed (in terms of some quantitative 
notion of discrepancy) to the integration error |IJ . These results, however, do 
not easily lend themselves to practical error estimates and moreover, being 
usually upper limits, may be too pessimistic in many applications. 

This situation is to be contrasted to that in classical Monte Carlo inte- 
gration: there, one settles for a probabilistic error estimate, which on the one 
hand does not give perfectly certain information but only confidence levels, 
but on the other hand can be easily computed by estimating not only the 
integral but at the same time the variance of the integrand. The essential 
point in this procedure is the existence of the Central Limit Theorem, which 
states that for a large number N of randomly chosen integration points, the 
integration error has an approximately Gaussian distribution with zero mean 
and a standard deviation related to the integrand's variance. The estimation 
of this single parameter therefore suffices to determine the shape of the error 
distribution. 

In this paper, we attempt to derive results similar to the Central Limit 
Theorem, for the case of Quasi-Monte Carlo. In previous publications |2|, || 
we have argued that such considerations require a definition of what con- 
stitutes an ensemble of iV-point quasi-random point sets. For truly random 
points, this is an easy problem since we may simply assume the points to be 
iid uniformly over the integration region. For quasi-random points the situ- 
ation is somewhat more subtle. We propose to use the fact that such more 
evenly distributed point sets are generally characterized by a low value of 
discrepancy: given some definition of discrepancy (we shall specify one later 
on), we restrict ourselves to the set of iV-point point sets in which the points 
are all uniformly iid, but with the additional condition that the discrepancy 
has a particular value s (by suitable integration over s, we shall of course ob- 
tain again the classical results for truly random points). We can then study 
the distribution of the integration error over this ensemble of point sets. 
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The lay-out of this paper is as follows. In section 2 we establish some 
notation and define our point set ensemble. In section 3 we derive our main 
result on the distribution of the integration error, in terms of a single com- 
plex integral. In section 4, we present explicit results for a particular, simple 
definition of discrepancy. In section 5 we attempt to do the same for what 
we believe constitutes a realistic discrepancy. In each case we aim at ar- 
riving at an error distribution that depends on only a single parameter (so 
that confidence levels for the integration result can easily be computed), and 
ultimately, of course, the ideal Gaussian error distribution. 



2 Notation and definitions 

Our integration region will always be the D-dimensional hypercube K = 
[0, 1) D , containing the point set X N = {x±, x 2 , ■ ■ ■ , x^}. Where necessary, we 
shall denote the individual components of the vector x& with Greek indices: 
so, Xk = = (xl,xl, . . . ,x®). Let the integrand be denoted by f(x); we 
assume, for simplicity, that the moments 

j p = Jdx f( x y (i) 

K 

exist at least for the first few values of p. The numerical integral estimate is 
given by 

1 N 

S =nT, /(**) , (2) 

k=l 

and the integration error i] is then, of course, 

V = S-J 1 . (3) 

It is the probability distribution of i] over the ensemble of point sets X N 
which is our object of concern here. 

We now turn to the definition of a discrepancy. We introduce the Fourier 
base of orthonormal function as follows. Starting with D — 1, we define 

u 2n -i(x) = v^2sin(27rna;) , u 2n (x) = \/2 cos(27rnx) , (4) 

for n = 1,2,3,..., and u (x) = 1. In more dimensions, we define vectors 
n = n M = (n 1 , n 2 , . . . , n D ) with integer, non-negative components, and write 

D 

Un{x) = ]J • (5) 

n=i 
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We assume that the integrand / can be decomposed into its various Fourier 
modes as follows: 

f( x ) = J2v H u H (x) , (6) 

n 

from which it immediately follows that 

Jl = v (0,0,...,0) , V = J2 V H = J 2 - J l ■ ( 7 ) 

n>0 

Here and in the following, the notation n > means a sum over all vectors n 
except the null vector (0, 0, . . . , 0). Quadratic integrability of the integrand 
requires that the variance V, i.e. the sum of the v%, converges. 

To each mode with wave vector n we associate a strength a ft. In ]2[ and 
we relate these strengths to a definition of an ensemble of integrands, by 
letting every be normally distributed with zero mean and width a^, but 
here we do not have to assume a particular such ensemble. The definition of 
(quadratic) discrepancy that we propose to use is 

1 N 

D N (X N ) = — /3(x k ,xi) , P(x k ,x l ) = ^ i o%Urt(xk)urt(xi) . (8) 

^ V k,l=l n>0 

An essential property is that 



dx k f3(x k ,xi) = j dxi (3(x k ,xi) = . (9) 

K K 

Another important assumption is that of translational invariance, by which 
the sines and cosines of each particular wave component have equal strength: 

0"(2n 1 ,2n 2 ,...,2n D ) = ^{2^ -l,2n 2 ,...,2n D ) = ® {2n> ,2n 2 -l,...,2n D ) = 

— "•• — °"(2n 1 -l,2n 2 -l,...,2n D -l) ■ (10) 

One of the consequences of this choice is that P(x k , x{) only depends on the 
difference x k — xi, and therefore 

(3(0) = JdxP{x,x) = y £°n ■ (11) 



K 



n>0 



Hence, for truly random points the expected value of the discrepancy is 

(D N (X N )} = £ al , (12) 

n>0 
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and of course we assume this sum to converge. For the particular point set X N 
we are employing, we assume the discrepancy D N (X N ) to have a known value 
s, non-negative by construction. Super-uniform, or quasi-random, point sets 
are distinguished by the fact that s is small compared to its expectation for 
random point sets. 

We now come to the definition of an ensemble of quasi-random point 
sets. We consider it to consist of all point sets that have a value s of the 
above discrepancy, but are otherwise unrestricted. The combined probability 
density Pjv for the N points Xk is then given by 

6 (D N (x 1 ,x 2 , ■ ■ .,x N ) - s) 
H (s) 

J dx\ ■ ■ ■ dxN 5 (-Djv(xi, X2, ■ ■ ■ , xn) — s) .(13) 

K 

The number H (s) serves to normalize the probability density P^'- it is 
nothing but the probability for a set of truly random points to attain the 
value s for its discrepancy. Indeed, we trivially have 

oo 

J ds H (s)P N (s;x 1 ,x 2 ,. . . ,x N ) = 1 . (14) 
o 



P N (s;x 1 ,x 2 , ...,x N ) 
H (s) 



3 The error distribution 

We now start to work our way towards a Central Limit Theorem for Quasi- 
Monte Carlo, assuming the point set Xn to be a member of the ensemble 
constructed above. Let P(s; 77) be the probability density of the integration 
error rj over the ensemble of possible point sets Xjy. We may write 

P(s;r}) = J dxi ■ ■ ■ dx N P N (s;xi, . . . ,x N ) 5 (rj — S + Ji) . (15) 



K 



Using the definition of the Dirac delta distributions in Eqs.([TB],|r5|) as Fourier 
integrals, we may write this as 



(OO 
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M(z,t) 



r ( z N t N \ 

/ dx x ■ ■ ■ dx N exp — f(xk) + T? H Pfak, x i) 



K 



]T -r M m (z) , (16) 



m>0 



where the integration contours for t and z run to the left of any singularities. 

In the spirit of the classical Central Limit Theorem, we must now proceed 
to take the asymptotic limit iV — > oo in a careful manner, taking into account 
that the dominant part of the z integral comes from the region where z is of 
order O (v^) • The procedure is most easily illustrated by considering the 
first few powers of t. To start, we have 

£/(**)/* / tl WArV iv 



Mq(z) = / dx\ ■ ■ ■ dxpf e k = (e 

K 



■ju>/Ny 



-p(--Ji + ^W-J?) + o(^)) ■ (17) 



Due to Eq.([]) the next contribution evaluates as follows: 

k,l 



M^z) = Jdxi---dx N e * —}_^f3(x k ,xi) 

K k > 1 



2N 

V 
N 

M (z) I / dx P(x,x) + 



K 



+ (g./W/tf y*- 1 ^ fdxe z ^ N {3(x,x) 



K 



+ j^J dxidx 2 f(x 1 )P(x 1 ,x 2 )f(x 2 ) j , (18) 

K / 

where we have suppressed all subleading terms. The higher-order terms can 
easily be worked out: the only combinations that survive in the limit N — > oo 
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C k = / dx x dx 2 ■ ■ -dx k P(xi,x 2 )P(x 2 ,x 3 ) ■ ■ ■ P(x k -i, x k )P(x k , Xi) 



arc 



K 

= E^ fc , (19) 

n>0 

and 

F k = J dxidx 2 ■ ■ -dx k dx k+1 f(x 1 )(3(x 1 ,x 2 ) ■ ■ ■ (3(x kl x k+1 )f(x k+1 ) 

= E vlof . (20) 

n>0 

These objects come with topological symmetry factors of 2 k /(2k) and 2 k /2, 
respectively @. To leading order in N, we can therefore write 

k J2 t^+\k 



(2t) k z* (2t) 



M(z,t) ~ M (^)exp E^Hf" +vE f ' 



Mo(,)e XP |EE^4EE^F 

^>0n>0 fc>0n>0 

n>0 n>0 



Combining everything, we have 



P{s; V ) 



dz dt 



Hn(s) J 2ni 2ni 
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—too 



1^^(1-2^) + ^)) , 

n>0 / 

Bit) = Y ^—y . (22) 



x exp I — :// — f.s 

.,,2 



The z integral converges provided ReB(t) > 0, which certainly holds if 1 — 
2cr|Ret > for all n. Performing the z integration, we arrive at our master 
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formula: 




p&v) = ~~Er~r~\ 

Hn(s) 

—ioo 

x exp (-ts - I £ log (l - 2ta 2 H ) - ^ | . (23) 

We see that, for the types of discrepancy discussed here, the error distri- 
bution is symmetric around 77 = 0. Its precise form, however, will depend 
on our choice for the a^. As we have said, a particular such choice reflects 
our belief about which kind of function class our actual integrand is a typical 
member of: but it must be realized that we are, in fact, allowed to take any 
choice for the that satisfies X)cr| < 00. A choice that does not 'fit' the 
behaviour of f(x) too well will just result in a somewhat worse error esti- 
mate: but the error distribution itself is only based on our assumption on the 
ensemble of point sets Xjy, and not on any assumption about the integrand 
apart from its quadratic integrability. 

From Eq. (|2~3"D a number of results immediately follow. In the first place, 
we can recover the case of truly random point sets by simply averaging over 
all possible values of s, with the appropriate probability distributionif ( s ) : 
this immediately leads to 







dsH (s)P(s; V ) = J^exp(-^\ , (24) 



which is the standard Central Limit Theorem. Another result comes from 
the normalization of P(s; rj): upon integrating over 7] we find 

ioo / \ 

Ho(s) = J ^ exp l-ts - - £ log (l - 2to$) , (25) 

in accordance with Ref.0. A final observation to be made is that the error 
7] only occurs in the combination r] 2 N. From this it immediately follows 
that, all other things being equal, the error will only decrease as 1/y/N. 
Any improved rate of convergence is therefore solely due to a decrease of the 
discrepancy value s with N. 
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4 A simple model: uniform strengths 



The first, and simplest, model that we shall consider is that where 2M of the 
<r| are equal to 1/2M, and all the other ones vanish. It is natural to take 
for the nonzero modes the ones with the lowest frequencies (i.e. small values 
of the components of if), but this is not necessary. As mentioned above, 
the choice of an only establishes which modes are covered, that is, enter in 
the computation of the discrepancy: a general integrand will, of course have 
modes with different frequencies, which are not covered. We therefore write 

V = "£v^ = Vi + V 3 , (26) 

n>0 

where V\ contains the 2M covered modes, for which a ft 7^ 0, and V% contains 
all the other, uncovered, ones. The larger V\ is with respect to V 2 , the better 
our discrepancy model 'fits' the integrand. We immediately have 



^X>g(l-2to3) = Mlog(l- 



t 

2f-l~~°v- -~ n J °V" M 

n>0 

Bit) = Vl/ (l-j^+V,, 
M M 




(27) 

where the last line holds for large M. Both the form of Hq(s) and that 
of j3(xk,xi) for this model are given in 0; by construction, the expected 
discrepancy for truly random points is (s) = 1. The master formula now 
becomes 



x T(M) T dx N 
P{s;t]) -- 



M M ~ l J 2m V 27r(V 2 + sV±/x) 



-too 



x exp [Mx -Mhgx - x^T7 r ) , (28) 
\ 2(V 2 + sVi/x) J 

where we have written x = s(l — t/M). Consequently the integration contour 
must cross the positive real axis. Two special cases can immediately be 
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derived from this. In the first place, suppose that we had chosen the nonzero 
in a very bad way, such that V\ = 0: that is, the integrand consists only 
of uncovered modes. It then follows immediately that 



p(s; '<>i*W^ exp (~w) • (29) 

which is the standard Central Limit Theorem. In this case, nothing is really 
lost, and the error estimate is just as good (or bad) as in classical Monte 
Carlo. On the other hand, if the integrand consists only of covered modes, 
so that Vi = 0, we find after some straightforward manipulations: 



P{s-ri) 




o AT \ M-3/2 



2ttVis V 2V l sM 



T(M) 2 / 1 \ 

5 < M > = 7mmr±T) = 1+0 (m) • < 30 > 

with the strict constraint rfN < 2V\sM. This follows from the fact that, if 
this inequality is violated, the complex integration contour for x can be closed 
to the right, where the integrand has no singularities; for the same reason 0, 
H (s) vanishes for s < 0. Note that, for this particular discrepancy, s can 
actually vanish: this happens in one dimension, if the point set is equidistant 
and N > M. In that case, rj is always zero, so that the function is integrated 
exactly. This is just another instance of the Nyqvist theorem [|J. 

For general V\ and V2, we may consider the case where M becomes large. 
The integral can then be approximated by the saddle-point method. The 
saddle point is located at x = 1 + O (1/M), and we find 



/ N / r] 2 N \ 

Pis ' r > ) ~i 2*(V 1 s + V 2 ) eXP {- 2(V lS + V 2 ) ) ■ (M) 

Again, we recover a Gaussian limiting distribution; its width is no longer 
parameterized by V — Vi + V% but rather by V\s + V^. the information we 
have gathered by computing the discrepancy s is seen to result in a reduced 
error, depending on how much of the fluctuating behaviour of the integrand is 
actually covered by the modes entering in the discrepancy. The limit of large 
M is actually justified by a self-consistency argument: the error distribution 
(|31| ) heavily suppresses the region rfN ^> 2{Vis + V2), so that (as can also 
be gleaned from Eq.([30l)) M does not actually have to be a huge number 
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for the saddle-point approximation to work. Note, moreover, that if we only 
allow the lowest frequency mode in each dimension, that is, only n M = 0, 1, 2 
for each component of n, M already equals (3 D — l)/2 which grows very 
rapidly with increasing D. The upshot of this (admittedly simple-minded, 
but nevertheless possible) model is: first, that we may hope for an error 
distribution which tends to a Gaussian (especially in high dimension), and, 
secondly, the width of this distribution depends on the discrepancy s in a 
manner which depends on the degree in which the relevant modes of the 
integrand correspond to those used in the evaluation of the discrepancy. We 
conjecture that these two conclusions will persist in more realistic models of 
discrepancy. 



5 A more realistic model: one dimension 

The model of discrepancy discussed above has the advantages both of sim- 
plicity and dimensionality-independence: but it may not be altogether too 
realistic, in particular because covered modes with high frequency 
sumed to have the same strength as those with low frequency. An alterna- 
tive, which we discuss now, covers all modes, but with strengths that decrease 
with increasing frequency. For simplicity, we start with D — 1. We shall take 

cr2n = o"2n-i = - , n = 1, 2, 3, . . . , (32) 
n 

just the same as in ||. For truly random points we have, then, (s) = 7r 2 /3, 
and we shall assume that we have at our disposal a point set with a discrep- 
ancy value s much lower than this average. First of all, we compute H (s) 
for this small s. In Ref. 0, we performed an exact calculation, but here we 
shall settle for a more simple-minded saddle-point approximation. We as- 
sume that the t integral in Eq.(^) is saturated by a saddle-point lying at 
t = — a 2 /2, that is, 




— ioo 



exp (0(-a 2 /2)) 
^27r0"(-a 2 /2) ' 
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Of \ 

n>0 



rv 



so 2 , , , /l 



-a 2 /2) = ^- - vra + log(2vra) + O 



0'(-a 2 /2) = - s + - + /J 



0"(-«72) = ^ + . (33) 



a 3 Va 4 



The saddle point is seen to correspond to a ~ n/s which is large for small 
values of s, thus justifying the neglect of higher orders in 1/a. The resulting 
form for H is (s <C tt 2 /3) 

, . 7l 2 V2^ ( 7T 2 \ 

ffo(«)~-^-«p^--J , (34) 

in agreement with the corresponding limit of the exact result from || . 

For the evaluation of the error distribution P(s; rj) we must now also 
compute B(t), which involves the unknown coefficients v n of the integrand. 
It is certainly too crude, but nonetheless instructive, to study the simple case 
where 



v l = a l , n = 1,2,3,.... 



In that case, we have 



B(t) = 2Y ^— = 2V — ^— = - + C»fl") , (35) 

where a has now to be determined anew for the saddle point in the t inte- 
gration of Eq. (|23|) . It is seen to be equal to 

a~- , 7 = 1 + ^ • (36) 
Performing the saddle integral we arrive at 



P M ) ~ J^^f-^-1) 



exp — — . (37) 



2vrs V 2s 
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This last, Gaussian, central limit is self-consistently justified from the fact 
that it implies rfN = O (s) which is indeed small by assumption. Note that 
we may write for this case (see Eq.flSID): 



S = V T\ ■ 

What, now, happens for more general v n l One answer is to assume that, 
since the integrand must be quadratically integrable, the sum v\ must 
converge; if we also assume that it has no exceptionally strong higher modes, 
it is reasonable to write 



J 2n-1 ~ u 2n ~ o 



where C is a constant, and the u n are numbers that are not too different 
from unity. Not rigorously, but at least reasonably, we may then write 



BW = £-^~- + 0(l) , (38) 
n z + a 2 a \a z 1 



leading to 



p < Si ">~fe exp [-U • (39) 



The essential point here is that the deviations of the individual uo n from 
unity can give rise, in Bit) to contributions that are of order O (1/a 2 ), and 
not of order O (1/a). Another argument leading to the same conclusion is 
to compute the moments of B(t) over the ensemble of integrands described 
in Refs.0, |3|: the v n are assumed to be normally distributed around zero 
with standard deviation o n . The expectation of B(—a 2 /2) is then, of course, 
just the result of Eq. (|35|) , but its variance goes as O (1/a 3 ). If a increases 
(for decreasing s), the probable values for B(t) therefore cluster together 
more and more closely around the expectation value, again justifying our 
approximations. 

A last example in this context is that of an integrand that has only a 
single mode, with frequency k, so that only f 2 / c and v^k-i are non-vanishing, 
and we have 

Vh 2 
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We immediately find that 



n 2 N 

iss -y> , (4i) 

by the same arguments as above; and, for 77 values smaller than this limit, 
we may again apply the saddle-point method to find 




2s) 

(42) 



where the last line holds if s and s are close in value. In this limit, again 
a Gaussian distribution is obtained, with variance V/(l + tc 2 / k 2 s 2 ) . Note 
that the error improvement now not only depends on the smallness of s but 
also on the number k; this is reasonable because the mode with frequency 
k enters in this particular discrepancy with a factor 1/k 2 so that, when k is 
large, a small value of s does not tell us too much about how well the kth 
mode is integrated by the point set. 



6 Conclusions and outlook 

We have shown that we can define a Central Limit for the case of Quasi- 
Monte Carlo using a suitable definition of the discrepancy. A master-formula 
was derived for the error-distribution density over point sets with a fixed 
discrepancy. 

We have given two simple examples of problem classes and their error- 
distribution densities. These results indicate that the expected error will 
improve if low-discrepancy point sets are used to evaluate integrals. 

We would like to extend these results to more realistic and more dimen- 
sional cases. An explicit result seems to be too far-fectched, at the moment, 
but it might be possible to use saddle-point methods to derive similar results 
for more realistic cases. 
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